DEgenes Hunter - Differential expression analysis report

Details of the input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
CTL_1
CTL_2
CTL_3
CTL_4

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
epm2a_1
epm2a_2
epm2a_3
epm2b_1
epm2b_2
epm2b_3
epm2b_4

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

Data quality control (QC)

Correlation between samples:

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.

These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Correlation between control samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Correlation between treatment samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower

Principal Component Analysis

This is a PCA plot of the count values following rlog normalization from the DESeq2 package:

The samples are shown in the 2D plane and distributed by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects. It is also useful for identifying outlier samples. Control and treatment samples respectively may cluster together.

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization method DESeq2):

Before normalization:

After normalization:

Samples differences by all counts normalized:

All counts were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly:

  • Filtered out: Genes discarded during the filtering process as showing no or very low expression.
  • Prevalent DEG: Genes considered as differentially expressed (DE) by at least 3 packages, as specified by the minpack_common argument.
  • Possible DEG: Genes considered DE by at least one of the DE detection packages.
  • Not DEG: Genes not considered DE in any package.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 3 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least on of the DE detection packages employed:

FDR gene-wise benchmarking

Benchmark of false positive calling (Image extracted from {“padj_prevalent_DEGs.pdf”} file):

Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

The complete results of the DEgenes Hunter differential expression analysis can be found in the “hunter_results_table.txt” file in the Common_results folder

DE detection package specific results

Various plots specific to each package are shown below:

DESeq2 normalization effects:

This plot compares the effective library size with raw library size

## `geom_smooth()` using formula 'y ~ x'

The effective library size is the factor used by DESeq2 normalizatioin algorithm for eahc sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot:

This is the MA plot from DESeq2 package:

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

A table containing the DESeq2 DEGs is provided: in Results_DESeq2/DEgenes_DESEq2.txt

A table containing the DESeq2 normalized counts is provided in Results_DESeq2/Normalized_counts_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts:

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

edgeR MA plot

This is the MA plot from edgeR package:

Differential gene expression data can be visualized as MA-plots (log ratio versus abundance) where each dot represents a gene. The differentially expressed genes are colored red and the non-differentially expressed ones are colored black.

A table containing the edgeR DEGs is provided in Results_edgeR/DEgenes_edgeR.txt

A table containing the edgeR normalized counts is provided in Results_edgeR/Normalized_counts_edgeR.txt

limma Volcano plot

Volcano plot of log2-fold change versus -log10 of adjusted p-values for all genes according to the analysis with limma:

A table containing the limma DEGs is provided in Results_limma/DEgenes_limma.txtA table containing the limma normalized counts is provided in Results_limma/Normalized_counts_limma.txt

WGCNA Results

WGCNA was run to look for modules (clusters) of coexpressed genes. These modules were then compared with the sample factors to look for correlation. If no sample factors were specified, this comparison was performed with treatment/control labels.

The following graphic shows the power value chosen for building clusters. The power is chosen by looking at the characteristics of the network produced.

In total there were 290 clusters. The following plot shows the number of genes per cluster:

The following plots show the correlation between the different modules and specified factors. This is done using eigengenes, which can be broadly thought of as the average expression pattern for the genes in a given cluster. MEn refers to the eigengene for cluster n.

Cluster assignment vs lower module membership (MM)

This plot shows, for each gene, the cluster ID ascertained by WGCNA, vs. the cluster whose eigen gene has the highest correlation (module membership/MM).

Cluster vs. factors correlation

This plot shows the correlation between clusters (eigen genes) and factors directly.

WGCNA Eigen values clustering

WGCNA dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using signed correlation so more near elements, more positive correlation between elements.

Eigen values clustering (Absolute correlation)

WGCNA like dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using absolute correlation so more near elements, more absolute correlation between elements.

Correlation between all clusters and factors

PCIT Results

Plots of some of the various metrics computed in the PCIT implementation. Better explanation and more plots to be added at a later date:

Detailed package results comparation

This is an advanced section in order to compare the output of the packages used to perform data analysis. The data shown here does not necessarilly have any biological implication.

P-value Distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

Correlations of adjusted p-values, adjusted for multiple testing (FDR) and for log Fold Change.

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

opt_orig
input_file /mnt/scratch/users/bio_267_uma/elenarojano/NGS_projects/LaforaRNAseq/analysis/RNA_seq/DEGenesHunter_results/ctrl_vs_mut/final_counts.txt
reads 2
minlibraries 2
filter_type separate
output_files /mnt/scratch/users/bio_267_uma/elenarojano/NGS_projects/LaforaRNAseq/analysis/RNA_seq/DEGenesHunter_results/ctrl_vs_mut
p_val_cutoff 0.05
lfc 1
modules WDELP
minpack_common 3
target_file /mnt/home/users/bio_267_uma/elenarojano/projects/LaforaRNAseq/analysis/RNA_seq/DEG_workflow/TARGETS/ctrl_vs_mut_target.txt
model_variables
numerics_as_factors TRUE
custom_model FALSE
string_factors group
numeric_factors Lcn2_serum,Cxcl10_serum,WB_Lcn2_brain,WB_Cxcl10_brain,WB_Ccl5_brain,qPCR_Lcn2,qPCR_Mmp3,qPCR_Cxcl10,qPCR_H2-M2,qPCR_C3,qPCR_Ccl2,qPCR_Ccl5,qPCR_Wisp2,qPCR_Ccl12,qPCR_mir146a,qPCR_mir155,qPCR_Glra2,qPCR_Btg2,qPCR_Fos,qPCR_Nsun3
WGCNA_memory 5000
WGCNA_deepsplit 2
WGCNA_min_genes_cluster 15
WGCNA_detectcutHeight 0.995
WGCNA_mergecutHeight 0.1
WGCNA_all FALSE
WGCNA_blockwiseNetworkType signed
WGCNA_blockwiseTOMType signed
debug FALSE
Debug /mnt/scratch/users/bio_267_uma/elenarojano/NGS_projects/LaforaRNAseq/analysis/RNA_seq/DEGenesHunter_results/ctrl_vs_mut/debug_files/DH_debug_session.RData
help FALSE